Overview

Dataset statistics

Number of variables17
Number of observations96643
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.9 MiB
Average record size in memory129.0 B

Variable types

NUM15
CAT2

Warnings

dob_year is highly correlated with ageHigh correlation
age is highly correlated with dob_yearHigh correlation
mobile_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
www_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly skewed (γ1 = 111.2322131) Skewed
mobile_likes_received is highly skewed (γ1 = 107.0512479) Skewed
www_likes_received is highly skewed (γ1 = 125.00983) Skewed
df_index has unique values Unique
userid has unique values Unique
friend_count has 1894 (2.0%) zeros Zeros
friendships_initiated has 2922 (3.0%) zeros Zeros
likes has 22042 (22.8%) zeros Zeros
likes_received has 24141 (25.0%) zeros Zeros
mobile_likes has 34290 (35.5%) zeros Zeros
mobile_likes_received has 29589 (30.6%) zeros Zeros
www_likes has 59980 (62.1%) zeros Zeros
www_likes_received has 36336 (37.6%) zeros Zeros

Reproduction

Analysis started2021-01-24 14:06:47.663964
Analysis finished2021-01-24 14:07:25.313559
Duration37.65 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct96643
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49117.06887
Minimum0
Maximum99002
Zeros1
Zeros (%)< 0.1%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile4913.1
Q124420.5
median48935
Q373703.5
95-th percentile93922.9
Maximum99002
Range99002
Interquartile range (IQR)49283

Descriptive statistics

Standard deviation28512.14914
Coefficient of variation (CV)0.580493702
Kurtosis-1.195116869
Mean49117.06887
Median Absolute Deviation (MAD)24639
Skewness0.01663037795
Sum4746820887
Variance812942648.6
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
887661< 0.1%
 
682761< 0.1%
 
662291< 0.1%
 
723741< 0.1%
 
703271< 0.1%
 
928561< 0.1%
 
969541< 0.1%
 
949071< 0.1%
 
846681< 0.1%
 
Other values (96633)96633> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
990021< 0.1%
 
990011< 0.1%
 
990001< 0.1%
 
989991< 0.1%
 
989981< 0.1%
 

userid
Real number (ℝ≥0)

UNIQUE

Distinct96643
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1597170.923
Minimum1000008
Maximum2193542
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum1000008
5-th percentile1060688.2
Q11299125
median1596245
Q31895876
95-th percentile2133390.8
Maximum2193542
Range1193534
Interquartile range (IQR)596751

Descriptive statistics

Standard deviation344021.4764
Coefficient of variation (CV)0.215394277
Kurtosis-1.199343009
Mean1597170.923
Median Absolute Deviation (MAD)298352
Skewness-0.0002675454002
Sum1.543553895e+11
Variance1.183507762e+11
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14438391< 0.1%
 
19075101< 0.1%
 
20836281< 0.1%
 
11396071< 0.1%
 
20244671< 0.1%
 
20856791< 0.1%
 
11795691< 0.1%
 
17784811< 0.1%
 
16515071< 0.1%
 
13822111< 0.1%
 
Other values (96633)96633> 99.9%
 
ValueCountFrequency (%) 
10000081< 0.1%
 
10000131< 0.1%
 
10000381< 0.1%
 
10000591< 0.1%
 
10000611< 0.1%
 
ValueCountFrequency (%) 
21935421< 0.1%
 
21935381< 0.1%
 
21935221< 0.1%
 
21934991< 0.1%
 
21934851< 0.1%
 

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct93
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.66490072
Minimum13
Maximum105
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q348
95-th percentile73
Maximum105
Range92
Interquartile range (IQR)28

Descriptive statistics

Standard deviation20.13183582
Coefficient of variation (CV)0.5644719435
Kurtosis1.275521029
Mean35.66490072
Median Absolute Deviation (MAD)10
Skewness1.295347135
Sum3446763
Variance405.2908136
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1851885.4%
 
2343934.5%
 
1943854.5%
 
2037673.9%
 
2136673.8%
 
2536293.8%
 
1732793.4%
 
1630813.2%
 
2230303.1%
 
2428272.9%
 
Other values (83)5939761.5%
 
ValueCountFrequency (%) 
134780.5%
 
1419202.0%
 
1526162.7%
 
1630813.2%
 
1732793.4%
 
ValueCountFrequency (%) 
105760.1%
 
104730.1%
 
10310361.1%
 
1021850.2%
 
1011550.2%
 

dob_day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.53257867
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q17
median14
Q322
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.000553099
Coefficient of variation (CV)0.6193362724
Kurtosis-1.186798032
Mean14.53257867
Median Absolute Deviation (MAD)8
Skewness0.1077744907
Sum1404472
Variance81.00995609
MonotocityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
176067.9%
 
1039464.1%
 
1534983.6%
 
534453.6%
 
1233333.4%
 
233293.4%
 
332373.3%
 
1732043.3%
 
2031943.3%
 
431543.3%
 
Other values (21)5869760.7%
 
ValueCountFrequency (%) 
176067.9%
 
233293.4%
 
332373.3%
 
431543.3%
 
534453.6%
 
ValueCountFrequency (%) 
3114421.5%
 
3024582.5%
 
2924342.5%
 
2828773.0%
 
2727022.8%
 

dob_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct93
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1977.335099
Minimum1908
Maximum2000
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum1908
5-th percentile1940
Q11965
median1985
Q31993
95-th percentile1998
Maximum2000
Range92
Interquartile range (IQR)28

Descriptive statistics

Standard deviation20.13183582
Coefficient of variation (CV)0.01018129695
Kurtosis1.275521029
Mean1977.335099
Median Absolute Deviation (MAD)10
Skewness-1.295347135
Sum191095596
Variance405.2908136
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
199551885.4%
 
199043934.5%
 
199443854.5%
 
199337673.9%
 
199236673.8%
 
198836293.8%
 
199632793.4%
 
199730813.2%
 
199130303.1%
 
198928272.9%
 
Other values (83)5939761.5%
 
ValueCountFrequency (%) 
1908760.1%
 
1909730.1%
 
191010361.1%
 
19111850.2%
 
19121550.2%
 
ValueCountFrequency (%) 
20004780.5%
 
199919202.0%
 
199826162.7%
 
199730813.2%
 
199632793.4%
 

dob_month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.288701717
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.525916743
Coefficient of variation (CV)0.5606748264
Kurtosis-1.238341143
Mean6.288701717
Median Absolute Deviation (MAD)3
Skewness0.0300618739
Sum607759
Variance12.43208888
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
11139711.8%
 
1082748.6%
 
581018.4%
 
880958.4%
 
379108.2%
 
778488.1%
 
977528.0%
 
1276968.0%
 
476397.9%
 
274497.7%
 
Other values (2)1448215.0%
 
ValueCountFrequency (%) 
11139711.8%
 
274497.7%
 
379108.2%
 
476397.9%
 
581018.4%
 
ValueCountFrequency (%) 
1276968.0%
 
1170457.3%
 
1082748.6%
 
977528.0%
 
880958.4%
 

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size755.0 KiB
male
57239 
female
39404 
ValueCountFrequency (%) 
male5723959.2%
 
female3940440.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.815454818
Min length4

tenure
Real number (ℝ≥0)

Distinct2380
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean523.0575106
Minimum1
Maximum3139
Zeros0
Zeros (%)0.0%
Memory size755.0 KiB

Quantile statistics

Minimum1
5-th percentile47
Q1224
median407
Q3658
95-th percentile1535
Maximum3139
Range3138
Interquartile range (IQR)434

Descriptive statistics

Standard deviation440.4878077
Coefficient of variation (CV)0.8421402977
Kurtosis2.412641644
Mean523.0575106
Median Absolute Deviation (MAD)207
Skewness1.570336249
Sum50549847
Variance194029.5087
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3001730.2%
 
3031700.2%
 
2721620.2%
 
2571610.2%
 
2421610.2%
 
2801590.2%
 
2971590.2%
 
2781580.2%
 
2851580.2%
 
2841570.2%
 
Other values (2370)9502598.3%
 
ValueCountFrequency (%) 
1600.1%
 
2710.1%
 
3780.1%
 
4860.1%
 
5900.1%
 
ValueCountFrequency (%) 
31391< 0.1%
 
31281< 0.1%
 
30191< 0.1%
 
28221< 0.1%
 
27161< 0.1%
 

friend_count
Real number (ℝ≥0)

ZEROS

Distinct2522
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean192.8686713
Minimum0
Maximum4923
Zeros1894
Zeros (%)2.0%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile3
Q130
median80
Q3202
95-th percentile707
Maximum4923
Range4923
Interquartile range (IQR)172

Descriptive statistics

Standard deviation383.1613042
Coefficient of variation (CV)1.986643562
Kurtosis51.26205372
Mean192.8686713
Median Absolute Deviation (MAD)62
Skewness6.128301581
Sum18639407
Variance146812.5851
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
018942.0%
 
118101.9%
 
211101.1%
 
38560.9%
 
57810.8%
 
47420.8%
 
107340.8%
 
247270.8%
 
297160.7%
 
67100.7%
 
Other values (2512)8656389.6%
 
ValueCountFrequency (%) 
018942.0%
 
118101.9%
 
211101.1%
 
38560.9%
 
47420.8%
 
ValueCountFrequency (%) 
49231< 0.1%
 
49171< 0.1%
 
48631< 0.1%
 
48451< 0.1%
 
48441< 0.1%
 

friendships_initiated
Real number (ℝ≥0)

ZEROS

Distinct1507
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.0517368
Minimum0
Maximum4144
Zeros2922
Zeros (%)3.0%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q116
median45
Q3115
95-th percentile413
Maximum4144
Range4144
Interquartile range (IQR)99

Descriptive statistics

Standard deviation187.7078064
Coefficient of variation (CV)1.769964473
Kurtosis43.77488642
Mean106.0517368
Median Absolute Deviation (MAD)36
Skewness5.222329218
Sum10249158
Variance35234.22058
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
029223.0%
 
122052.3%
 
215271.6%
 
313451.4%
 
413381.4%
 
613151.4%
 
513141.4%
 
1113041.3%
 
812961.3%
 
1312631.3%
 
Other values (1497)8081483.6%
 
ValueCountFrequency (%) 
029223.0%
 
122052.3%
 
215271.6%
 
313451.4%
 
413381.4%
 
ValueCountFrequency (%) 
41441< 0.1%
 
36541< 0.1%
 
35941< 0.1%
 
35381< 0.1%
 
34151< 0.1%
 

likes
Real number (ℝ≥0)

ZEROS

Distinct2908
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.4591641
Minimum0
Maximum25111
Zeros22042
Zeros (%)22.8%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q380
95-th percentile728
Maximum25111
Range25111
Interquartile range (IQR)79

Descriptive statistics

Standard deviation576.2652841
Coefficient of variation (CV)3.683167344
Kurtosis199.3027162
Mean156.4591641
Median Absolute Deviation (MAD)11
Skewness11.00879469
Sum15120683
Variance332081.6776
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02204222.8%
 
168007.0%
 
243574.5%
 
331723.3%
 
424322.5%
 
519792.0%
 
617531.8%
 
715731.6%
 
813971.4%
 
913511.4%
 
Other values (2898)4978751.5%
 
ValueCountFrequency (%) 
02204222.8%
 
168007.0%
 
243574.5%
 
331723.3%
 
424322.5%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
165831< 0.1%
 
147991< 0.1%
 

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2661
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean143.331219
Minimum0
Maximum261197
Zeros24141
Zeros (%)25.0%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q358
95-th percentile563.9
Maximum261197
Range261197
Interquartile range (IQR)57

Descriptive statistics

Standard deviation1402.517934
Coefficient of variation (CV)9.785153184
Kurtosis17078.86114
Mean143.331219
Median Absolute Deviation (MAD)8
Skewness111.2322131
Sum13851959
Variance1967056.556
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02414125.0%
 
171667.4%
 
244604.6%
 
332803.4%
 
426002.7%
 
523102.4%
 
618201.9%
 
716451.7%
 
814881.5%
 
913211.4%
 
Other values (2651)4641248.0%
 
ValueCountFrequency (%) 
02414125.0%
 
171667.4%
 
244604.6%
 
332803.4%
 
426002.7%
 
ValueCountFrequency (%) 
2611971< 0.1%
 
1781661< 0.1%
 
1520141< 0.1%
 
1060251< 0.1%
 
826231< 0.1%
 

mobile_likes
Real number (ℝ≥0)

ZEROS

Distinct2385
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.7407572
Minimum0
Maximum25111
Zeros34290
Zeros (%)35.5%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile484
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation448.7178652
Coefficient of variation (CV)4.203810025
Kurtosis358.1396129
Mean106.7407572
Median Absolute Deviation (MAD)4
Skewness14.13021401
Sum10315747
Variance201347.7226
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03429035.5%
 
161636.4%
 
238514.0%
 
328442.9%
 
421952.3%
 
517441.8%
 
615551.6%
 
713591.4%
 
811861.2%
 
911141.2%
 
Other values (2375)4034241.7%
 
ValueCountFrequency (%) 
03429035.5%
 
161636.4%
 
238514.0%
 
328442.9%
 
421952.3%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
140391< 0.1%
 
135291< 0.1%
 

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1991
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.47543019
Minimum0
Maximum138561
Zeros29589
Zeros (%)30.6%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile319
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation847.6838415
Coefficient of variation (CV)10.03467919
Kurtosis15322.77126
Mean84.47543019
Median Absolute Deviation (MAD)4
Skewness107.0512479
Sum8163959
Variance718567.8951
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02958930.6%
 
180798.4%
 
248425.0%
 
335163.6%
 
428613.0%
 
523222.4%
 
619582.0%
 
716941.8%
 
814651.5%
 
913961.4%
 
Other values (1981)3892140.3%
 
ValueCountFrequency (%) 
02958930.6%
 
180798.4%
 
248425.0%
 
335163.6%
 
428613.0%
 
ValueCountFrequency (%) 
1385611< 0.1%
 
1312441< 0.1%
 
899111< 0.1%
 
733331< 0.1%
 
434101< 0.1%
 

www_likes
Real number (ℝ≥0)

ZEROS

Distinct1712
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.71835518
Minimum0
Maximum14865
Zeros59980
Zeros (%)62.1%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q36
95-th percentile205
Maximum14865
Range14865
Interquartile range (IQR)6

Descriptive statistics

Standard deviation287.2087478
Coefficient of variation (CV)5.77671459
Kurtosis448.6697188
Mean49.71835518
Median Absolute Deviation (MAD)0
Skewness16.94097656
Sum4804931
Variance82488.86481
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
05998062.1%
 
145724.7%
 
226912.8%
 
318912.0%
 
413711.4%
 
511681.2%
 
610501.1%
 
78610.9%
 
87720.8%
 
97310.8%
 
Other values (1702)2155622.3%
 
ValueCountFrequency (%) 
05998062.1%
 
145724.7%
 
226912.8%
 
318912.0%
 
413711.4%
 
ValueCountFrequency (%) 
148651< 0.1%
 
129031< 0.1%
 
110771< 0.1%
 
107631< 0.1%
 
106271< 0.1%
 

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1627
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.85574744
Minimum0
Maximum129953
Zeros36336
Zeros (%)37.6%
Memory size755.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile228
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation608.2756902
Coefficient of variation (CV)10.33502617
Kurtosis23311.62097
Mean58.85574744
Median Absolute Deviation (MAD)2
Skewness125.00983
Sum5687996
Variance369999.3152
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03633637.6%
 
183248.6%
 
249845.2%
 
334823.6%
 
427492.8%
 
522502.3%
 
618541.9%
 
715601.6%
 
814041.5%
 
913351.4%
 
Other values (1617)3236533.5%
 
ValueCountFrequency (%) 
03633637.6%
 
183248.6%
 
249845.2%
 
334823.6%
 
427492.8%
 
ValueCountFrequency (%) 
1299531< 0.1%
 
621031< 0.1%
 
396051< 0.1%
 
392131< 0.1%
 
340391< 0.1%
 

age_group
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size94.8 KiB
21-30
28610 
10-20
24714 
31-40
12481 
51-60
9287 
41-50
8960 
Other values (5)
12591 
ValueCountFrequency (%) 
21-302861029.6%
 
10-202471425.6%
 
31-401248112.9%
 
51-6092879.6%
 
41-5089609.3%
 
61-7068287.1%
 
71-8022342.3%
 
>10015251.6%
 
91-10012011.2%
 
81-908030.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length5
Mean length4.996647455
Min length4

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexuseridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_receivedage_group
0020943821419199911male2660000000010-20
111192601142199911female60000000010-20
2220838841416199911male130000000010-20
3312031681425199912female930000000010-20
441733186144199912male820000000010-20
551524765141199912male150000000010-20
661136133131420001male120000000010-20
78136517413120001male810000000010-20
89171256713220002male1710000000010-20
9101612453132220002male980000000010-20

Last rows

df_indexuseridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_receivedage_group
96633989931654565191519948male394453841444501150884435596166912710-20
9663498994206300620419931female402198833273511060257248733331033269210-20
96635989951132164209199310female699361197345077768441469099385910-20
96636989961668695242519894female18229381272601817765584311708175605721-30
966379899714589852814198512female2902218161846261026842904250336601821-30
9663898998126829968419454female5412118341399618089350511887491620261-70
96639989991256153181219953female21196817204401134124399105922282010-20
96640990001195943151019985female11120021524119591255411959114620109210-20
96641990011468023231119904female41625601854506651645065760075621-30
96642990021397896391519745female3972049768941012443941095300291331-40